Learning for stochastic dynamic programming

نویسندگان

  • Sylvain Gelly
  • Jérémie Mary
  • Olivier Teytaud
چکیده

We present experimental results about learning function values (i.e. Bellman values) in stochastic dynamic programming (SDP). All results come from openDP (opendp.sourceforge.net), a freely available source code, and therefore can be reproduced. The goal is an independent comparison of learning methods in the framework of SDP. 1 What is stochastic dynamic programming (SDP) ? We here very roughly introduce stochastic dynamic programming. The interested reader is referred to [1] for more details. Consider a dynamical system that stochastically evolves in time depending upon your decisions. Assume that time is discrete and has finitely many time steps. Assume that the total cost of your decisions is the sum of instantaneous costs. Precisely: cost = c1 + c2 + · · ·+ cT , ci = c(i, xi, di), xi = f(xi−1, di−1, ωi), di−1 = strategy(xi−1, ωi) where xi is the state at time step i, the ωi are a random process, cost is to be minimized, and strategy is the decision function that has to be optimized. Stochastic dynamic programming is a control problem : the element to be optimized is a function. Stochastic dynamic programming is based on the following principle : Take the decision at time step t such that the sum ”cost at time step t due to your decision” plus ”expected cost from time steps t + 1 to T from the state resulting from your decision” is minimal. Bellman’s optimality principle states that this strategy is optimal. Unfortunately, it can only be applied if the expected cost from time steps t+1 to T can be guessed, depending on the current state of the system and the decision. Bellman’s optimality principle reduces the control problem to the computation of this function. If xt can be computed from xt−1 and dt−1 (i.e., if f is known) then this is reduced to the computation of V (t, xt) = Ec(t, xt, dt) + c(t+ 1, xt+1, dt+1) + · · ·+ c(T, xT , dT ) Note that this function depends on the strategy. We consider this expectation for any optimal strategy (even if many strategies are optimal, V is uniquely determined). Stochastic dynamic programming is the computation of V backwards in time, thanks to the following equation : V (t, xt) = infdt c(t, xt, dt) + V (t+ 1, xt+1), or equivalently V (t, xt) = inf dt [ c(t, xt, dt) + Eωt+1V (t+ 1, f(xt, dt, ωt+1)) ]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Dynamic Programming with Markov Chains for Optimal Sustainable Control of the Forest Sector with Continuous Cover Forestry

We present a stochastic dynamic programming approach with Markov chains for optimal control of the forest sector. The forest is managed via continuous cover forestry and the complete system is sustainable. Forest industry production, logistic solutions and harvest levels are optimized based on the sequentially revealed states of the markets. Adaptive full system optimization is necessary for co...

متن کامل

A Multi-Stage Single-Machine Replacement Strategy Using Stochastic Dynamic Programming

In this paper, the single machine replacement problem is being modeled into the frameworks of stochastic dynamic programming and control threshold policy, where some properties of the optimal values of the control thresholds are derived. Using these properties and by minimizing a cost function, the optimal values of two control thresholds for the time between productions of two successive nonco...

متن کامل

A Defined Benefit Pension Fund ALM Model through Multistage Stochastic Programming

We consider an asset-liability management (ALM) problem for a defined benefit pension fund (PF). The PF manager is assumed to follow a maximal fund valuation problem facing an extended set of risk factors:  due to the longevity of the    PF members, the inflation affecting salaries in real terms and future incomes, interest rates and market factors affecting jointly the PF liability and asset p...

متن کامل

Modelling and Decision-making on Deteriorating Production Systems using Stochastic Dynamic Programming Approach

This study aimed at presenting a method for formulating optimal production, repair and replacement policies. The system was based on the production rate of defective parts and machine repairs and then was set up to optimize maintenance activities and related costs. The machine is either repaired or replaced. The machine is changed completely in the replacement process, but the productio...

متن کامل

Two-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect

This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...

متن کامل

Robust inter and intra-cell layouts design model dealing with stochastic dynamic problems

In this paper, a novel quadratic assignment-based mathematical model is developed for concurrent design of robust inter and intra-cell layouts in dynamic stochastic environments of manufacturing systems. In the proposed model, in addition to considering time value of money, the product demands are presumed to be dependent normally distributed random variables with known expectation, variance, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006